Statistical Distributions 


Uncertainties in LCA 


Uncertainties in LCA (Subject editor: Andreas Ciroth) 


Representing Statistical Distributions for Uncertain Parameters in LCA 

Relationships between mathematical forms, their representation in EcoSpold, 
and their representation in CMLCA 

Reinout Heijungs 1 * and Rolf Frischknecht 2 

'Institute of Environmental Sciences (CML), Leiden University, POB 9518, 2300 RA Leiden, The Netherlands 
2 ESU-services, Kanzleistrasse 4, 8610 Uster, Switzerland 


* Corresponding author ( heiiungs@cml.leidenuniv.nl ) 


DOI: http://dx.doi.ora/10.1065/lca2004.09.177 
Abstract 

Introduction. Statistical information for LCA is increasingly be¬ 
coming available in databases. At the same time, processing of 
statistical information is increasingly becoming easier by soft¬ 
ware for LCA. A practical problem is that there is no unique 
unambiguous representation for statistical distributions. 
Representations. This paper discusses the most frequently en¬ 
countered statistical distributions, their representation in math¬ 
ematical statistics, EcoSpold and CMLCA, and the relationships 
between these representations. 

The distributions. Four statistical distributions are discussed: 
uniform, triangular, normal and lognormal. 

Software and examples. An easy to use software tool is avail¬ 
able for supporting the conversion steps. Its use is illustrated 
with a simple example. 

Discussion. This paper shows which ambiguities exist for speci¬ 
fying statistical distributions, and which complications can arise 
when uncertainty information is transferred from a database 
to an LCA program. This calls for a more extensive standardi¬ 
zation of the vocabulary and symbols to express such informa¬ 
tion. We invite suppliers of software and databases to provide 
their parameter representations in a clear and unambiguous 
way and hope that a future revision of the ISO/TS 14048 docu¬ 
ment will standardize representation and terminology for sta¬ 
tistical information. 


Keywords: CMLCA; ecoinvent; EcoSpold, ISO-14048; lognormal 
distribution; normal distribution; statistical distributions; trian¬ 
gular distribution; uncertainties; uniform distribution 


Introduction 

Uncertainty calculations in LCA have been made for quite 
some years now (see, e.g., Meier (1997), Copius Peereboom 
et al. (1999), Maurice et al. (2000), Sonnemann et al. (2003), 
Huijbregts et al. (2003)). However, many, if not most, LCA 
studies do not perform uncertainty calculations, despite the 
generally agreed recommendation that a consideration of 
the quality and the robustness of the results of an LCA are 
an indispensable part of the decision-support (Huijbregts 
et al. 2004). 


The incorporation of uncertainty calculations as a routine 
step in LCA requires the extension of databases and soft¬ 
ware that contain and support such information. The present 
advent of databases and software for LCA that support cal¬ 
culations with stochastic input data calls for a review of the 
most frequently assumed statistical distributions. These are 

• the uniform distribution; 

• the triangular distribution; 

• the normal or Gaussian distribution; 

• the lognormal distribution. 

Although the mathematical form and properties of these dis¬ 
tributions are well-known, it is often problematic to connect 
theory, data, and software. One reason for this is that there is 
some freedom in choosing the parameters that describe these 
distributions. For instance, a uniform distribution can be de¬ 
scribed with a lowest and a highest value, or alternatively with 
a mean value and a width or half-width. Of course, getting 
the right uncertainty information and deciding which statisti¬ 
cal distribution is appropriate is difficult as well; this problem 
is, however, not addressed in this paper. 

The purpose of this technical paper is to describe the rela¬ 
tionship between three representations: 

• the mathematical form; 

• the EcoSpold representation chosen by the ecoinvent database 
(an extensive LCA database that includes quantitative uncertainty 
information, see, e.g., Frischknecht et al. 2004); 

• the representation chosen by the CMLCA software (an advanced 
LCA software tool that includes uncertainty analyses in a numeri¬ 
cal way - by Monte Carlo analysis - and in an analytical way - by 
formulae for error propagation). 

Tables with cross-formulae enable a quick translation of one 
form into another form. 

1 Representations 

In this paper, three different representations of statistical 
distributions are used: the most often used mathematical 
form, the representation in EcoSpold, and the representa¬ 
tion in CMLCA. These three representations should suffice 
to understand the relationships involved, and to add similar 
information for any other LCI database, set of LCIA char¬ 
acterization factors, or LCA software. 
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1.1 The mathematical form 

There is no unique mathematical representation. Apart from 
obvious differences in the choice of symbols in the formulae 
(like changing x into y), scales and origins may be shifted as 
long as this is done consistently. For instance, a uniform 
distribution may be described by a probability density func¬ 
tion having a non-zero value between a and b: 

, , —-— a < x < b 

f(x) = U-a 

[ 0 otherwise 

or equivalently by a probability density function having a 
non-zero value in a range that has a in its centre and a half¬ 
width of b: 

, , — a — b < x < a + b 

fix) = 2b 

[ 0 otherwise 

Observe that the parameters a and b are used differently in 
these two formulae. In this paper, we have used the book by 
Morgan & Henrion (1990) for the representations in Section 2. 


value, the variance s 2 of the distribution as a parameter. The 
next sections will therefore also contain expression for s 2 in 
terms of the (mean) value and the uncertainty parameter. 

In addition to carrying out Monte Carlo simulations and 
using analytical formulae for error propagation, CMTCA 
also offers a way to add a generic uncertainty value to a 
large set of data items simultaneously. This is done on the 
basis of the coefficient of variation, which is defined as the 
dimensionless ratio between the distribution's standard de¬ 
viation and its mean: 



x 


With a fixed mean value, the dispersion parameter of the 
distribution is adjusted so as to satisfy 

s = CVT 

In other words, we need a formula of the form 
width = f(value,CV) 

for the uniform and triangular distribution, 


1.2 The EcoSpold representation 

The ecoinvent parameters (see http://www.ecoinvent.ch/ l are 
based on the EcoSpold format (see http://www.ecoinvent.ch/ 
download/EcoSpoldSchema vl .O.zip ). which again has its 
roots in the Spold 99 format (see http://www.spold.org/publ/ 
SPOLD99.zip ). The EcoSpold format accommodates the 
following relevant keywords: 

• uncertaintyType (field 1 3708; kind of uncertainty distribution) 

• meanValue (field 3707; (arithmetical) mean amount, further ab¬ 
breviated as MeanV) 

• minValue (field 3795; minimum value, further abbreviated as MinV) 

• maxValue (field 3796; maximum value, further abbreviated as 
MaxV) 

• mostLikelyValue (field 3797; not used in ecoinvent data vl.1) 

• standardDeviation95 (field 3709; the square of the geometric stand¬ 
ard deviation, and the double standard deviation for the lognormal, 
and normal distribution, respectively, further abbreviated as SD95). 

1.3 The CMLCA representation 

The CMLCA parameters (see http://www.leidenuniv.nl/cml/ 
ssp/software/cmlca/ ) are based on just three variables: 

• value ((arithmetical) mean amount) 

• distribution (kind of uncertainty distribution) 

• uncertainty (some measure of dispersion) 

This latter variable is labeled as follows: 

• sigma (in case of normal distribution); 

• width (in case of uniform and triangular distribution); 

• phi (in case of lognormal distribution). 

The meaning of these variables will be explained later. 

As already mentioned, CMLCA includes analytical expres¬ 
sions for error propagation. These require, besides the mean 


1 The 'field' is a unique identifier number, used in the documentation of 

the EcoSpold format. 


sigma = f(value,CV) 

for the normal distribution, and 

phi = f(value, CV) 

for the lognormal distribution. Concrete elaborations will 
be provided in the subsequent sections. 

2 The distributions 

This section will discuss the four statistical distributions that 
are most commonly used in the context of stochastic LCA: 
the uniform, triangular, normal and lognormal distributions. 

2.1 The uniform distribution 

The uniform distribution (see Morgan & Henrion 1990, p. 95) 
is a mathematically simple distribution. In EcoSpold, the 
keyword UncertaintyType has the value 4 to denote this dis¬ 
tribution. In CMLCA, it is the distribution that is listed as 
the second choice, and is represented as U (width). 

It has a probability density function (Fig. 1) of the form 
,, , —-— a<x<b 

f(x) = U-a 

I 0 otherwise 
Its mean value is given by 
x = f{a + b) 
and its variance by 

s 2 =^(p~ a ? 
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uniform (a=4; b=8) 

1.1 T 


--- 1 

a b 9.6 

Fig. 1: The probability density function of the uniform distribution with pa¬ 
rameters a=4 and b =8 

Table 1 shows how the parameters of the distribution, a 
and b, can be transformed into the parameters that are re¬ 
quired or provided by EcoSpold and CMLCA. 

In CMLCA, the coefficient of variation translates into 

width = 2^3 xCV x value 

For the variance we finally have 

21.2 
s~ = —width “ 

12 

2.2 The triangular distribution 

The symmetric triangular distribution 2 (see Morgan & Henrion 
1990, p. 96) is slightly more complicated than the uniform 

2 Although ecoinvent in principle can accommodate an asymmetric trian¬ 
gular distribution, it has been excluded from the discussion in this paper, 
because Morgan & Henrion (1990) do not discuss it, and because CMLCA 
does not support it. 


distribution. In EcoSpold, the keyword UncertaintyType has 
the value 3 to denote this distribution. In CMLCA, it is the 
distribution that is listed as the third choice, and is repre¬ 
sented as T (width). 

It has a probability density function (Fig. 2) of the form 
b — \x — a\ 

r , , - ■ -- a — b<x<a + b 

/« = j 

0 otherwise 

with b> 0. 

Its mean value is given by 


x = a 


and its variance by 


s 



b 2 



Fig. 2: The probability density function of the triangular distribution with 
parameters a=6 and b =2 


Table 1: Relationship between the representations for the uniform distribution 



To 

From 

Mathematical form 

EcoSpold 

CMLCA 

Mathematical form 


Mean V = -i(a + i>) 

MinV = a 

MaxV = b 

va/ue = T(a + t)) 
width = b-a 

EcoSpold 

a = MinV 

b = MaxV 


value = X(MinV + MaxV ) 
width = MaxV - Min V 

CMLCA 

a = value -X width 

b = value + X width 

MeanV = value 

Min V = value width 

MaxV = value + Xwidth 



* Alternatively: value = MeanV 
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Table 2: Relationship between the representations for the triangular distribution 



To 

From 

Mathematical form 

EcoSpold 

CMLCA 

Mathematical form 


MeanV = a 

MinV = a-b 

MaxV = a + b 

value = a 

width = 2b 

EcoSpold 

a = X(MinV + MaxV ) 
b = X(MaxV-MinV ) 

' 

value = X(MinV + MaxV ) 
width = MaxV - MinV 

CMLCA 

a = value 

b = Xwidth 

MeanV = value 

Min V = value width 

MaxV = value + X width 



Table 2 shows how the parameters of the distribution, a 
and b, can be transformed into the parameters that are re¬ 
quired or provided by EcoSpold and CMLCA. 

In CMLCA, the coefficient of variation translates into 

width = 2\[6 x CV x value 

For the variance we finally have 

s~ = —width~ 

24 

2.3 The normal distribution 

The normal or Gaussian distribution (see Morgan & Henrion 
1990, p. 88) looks mathematically more difficult than the 
uniform and triangular distributions, but is in fact easier to 
deal with. In EcoSpold, the keyword Uncertainty Type has 
the value 2 to denote this distribution. In CMLCA, it is the 
distribution that is listed as the first choice, and is repre¬ 
sented as N(sigma). 

It has a probability density function (Fig. 3) of the form 


Fig. 3: The probability density function of the normal distribution with pa¬ 
rameters g=6 and o=1 

with O>0. 

The factor 2 is in fact the rounded value of 1.96, the two- 
sided critical value at significance level 0.95 from a table of 
Its mean value is given by the normal distribution (Abramowitz & Stegun 1972, p. 968). 

Table 3 shows how the parameters of the distribution, jd 
and a, can be transformed into the parameters that are re¬ 
quired or provided by EcoSpold and CMLCA. 

In CMLCA, the coefficient of variation translates into 


The ecoinvent documentation for the EcoSpold format gives 
as an explanation of the SD95 that it represents the double 
standard deviation, s: 

SD95 = 2s 


sigma = CV x value 

For the variance we finally have 


x = M 

and its variance by 


fix) = 


s[2ko 


exp 




normal (mu=6; sigma=1) 


-i 

10 


/T\ 


\ 




mu 

l 


sigma 
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Table 3: Relationship between the representations for the normal distribution 



To 

From 

Mathematical form 

EcoSpold 

CMLCA 

Mathematical form 

- 

Mean V = p 

value = p 



SD95 = 2a 

sigma = g 

EcoSpold 

p = Mean V 

- 

value = MeanV 


a = ±SD95 


sigma = ±SD95 

CMLCA 

p = value 

Mean V = value 

- 


g = sigma 

SD95 = 2 x sigma 



2.4 The lognormal distribution 

The lognormal distribution (see Morgan & Henrion 1990, 
p. 89) is, due its asymmetry, more difficult than the other 
distributions discussed, both in its mathematics and in its 
interpretation. Nevertheless, it is an extremely often used 
distibution. In EcoSpold, the keyword UncertaintyType has 
the value 1 to denote this distribution; this is in fact the 
default value. In CMLCA, it is the distribution that is listed 
as the fourth choice, and it represented as L (phi). 

It has a probability density function (Fig. 4) of the form 


/(*) = 


1 

— exp 
v2^(Zh 



0 


x > 0 
otherwise 


with (|)>0. 

Its mean value is given by 
x = exp(<f + -i-0 2 ) 


lognormal (xi=1; phi=0.3) 


1: median=exp(xi) 

2: mode=exp(xi-phi) 

3: exp(xi+phi) 

4: exp(xi-phi A 2) 

5: mean=exp(xi+0.5 > phi A 2) 



2 4 15 3 6.22 


Fig. 4: The probability density function of the lognormal distribution with 
parameters E,=1 and <(>=0.3 


and its variance by 

j 2 = exp(^ 2 )-(exp(^ 2 )-l)-exp(2^) 

The ecoinvent documentation for the EcoSpold format gives 
as an explanation of the SD9S that it represents the square 
of the geometric standard deviation, SDg: 

SD95 = SDg 2 

The exponent 2 is in fact the rounded value of 1.96, the 
two-sided critical value at significance level 0.95 from a ta¬ 
ble of the lognormal distribution. As most tables do not 
specify the cumulative lognormal density, one should use 
the cumulative normal density (Abramowitz & Stegun 1972, 
p. 968), and perform a logarithmic transformation. The natu¬ 
ral logarithm of the geometric standard deviation is the stand¬ 
ard deviation of the natural logarithm of x (Strom & 
Stansbury 2000): 

0 = In (SDg) 


For completeness of interpretation, we also provide formu¬ 
lae for the median 

-^median = ex p(^) 

and the mode 
X mode = eX P(‘?-^) 

Table 4 shows how the parameters of the distribution, § 
and <j>, can be transformed into the parameters that are re¬ 
quired or provided by EcoSpold and CMLCA. 

In CMLCA, the coefficient of variation translates into 
phi = *Jh\(cV 2 + 1) 

so independent of value. For the variance we finally have 
s 2 = (exp [phi 2 )- 1)- value 2 
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Table 4: Relationship between the representations for the lognormal distribution. 



To 

From 

Mathematical form 

EcoSpold 

CMLCA 

Mathematical form 


MeanV = exp^+A-0 2 ) 

SD95 = exp (2<i) 

value = exp(£+4-0 2 j 
phi = ip 

EcoSpold 

Z=\n(MeanV)-±(\n(SD95)f 
= J-ln (SD95) 


value = MeanV 
phi = ±\n(SD95) 

CMLCA 

#=ln (value)phi 2 
</> = phi 

MeanV = value 

SD95 = exp (2 x phi ) 

' 



Fig. 5: Screen shot of the software-tool 'Distributions' for the lognormal distribution using the EcoSpold representation with the parameters meanValue = 10 
and standardDeviation95 = 1.2 


3 Software and Example 

An easy to use software tool has been developed to assist us¬ 
ers of ecoinvent and/or CMLCA in translating and interpret¬ 
ing distributions and their parameters in the three representa¬ 
tions discussed here. Fig. 5 shows a screen shot of the user 
interface. The software tool can be downloaded from http:// 
www.ecoinvent.net/en/uncertainty.htm and from http:// 
www.leidenuniv.nl/cml/ssp/software/cmlca/distributions.html . 

To illustrate the use of the tables and the software tool, we 
give an example. Suppose we have a data item that has been 
specified in EcoSpold as uncertaintyType=l; meanValue=10; 
standardDeviation95=1.2. To translate this into CMLCA- 
form, we use from Table 4 in Section 2.4 the formulae in the 


fourth row and the fourth column (from EcoSpold to 
CMLCA). These formulae are: 

value = MeanV 
phi = j In (SD95) 

Upon entering the values for meanValue and standard- 
Deviation95, we find value=10 and phi=0.0912. In the soft¬ 
ware tool, one selects the lognormal distribution and the 
EcoSpold-representation and clicks 'Edit parameters'. The 
values 10 and 1.2 are entered respectively. Then one changes 
the representation into CMLCA, and reads from the small 
table in the bottom left comer the values for 'value' and 'phi': 
10 and 0.0912 respectively. 
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4 Discussion 

The importance of including uncertainty information into 
LCA has been recognized for more than a decade; see de 
Beaufort et al. (2003) for a review. Two main lines can be 
distinguished: the use of data quality indicators, and the use 
of statistical measures of dispersion, like standard deviations. 
A clear advantage of using data quality indicators is the 
possibility to capture uncertainty-related information that 
is difficult to quantify, such as the degree of data validation. 
An obvious advantage of quantitative information is the 
possibility to use methods from mathematical statistics to 
assess the uncertainty over the entire life cycle. Especially in 
large databases and advanced computer programs, the lat¬ 
ter type of analysis may be used for automatic uncertainty 
and sensitivity analyses. Experiences gained with ecoinvent 
Data vl.l showed that primary information on variability 
and parameter uncertainty of unit processes due to e.g. meas¬ 
urement uncertainties, process specific variations, temporal 
variations is hardly available. A standardised procedure 
based on data quality indicators has been applied to over¬ 
come this shortcoming (Frischknecht and Heck 2004). 

The EcoSpold format is an important and widely-used stand¬ 
ard for exchanging and reporting inventory data. There are 
other data formats as well. Perhaps the most important one 
is the one provided by ISO 14048 (Anonymous 2002). For¬ 
tunately, it contains fields (1.2.12) for including statistical 
information. But being primarily a data reporting format, it 
does not standardize the statistical vocabulary. This may lead 
to ambiguous and defective processing of the data files by 
software for LCA. The examples that illustrate the ISO/TS- 
14048 show that the field for name (ISO/TS 1.2.12.1) can 
be filled in many ways ('mean 1 , 'mode', 'range', 'single point' 
are explicitly mentioned in ISO/TS 14048, Section 7.3, but 
the nomenclature is not mandatory), and that the same ap¬ 
plies to the name of the parameter field (Coefficient of vari¬ 
ance', 'Maximum value 1 , 'Mean', 'Median', Minimum value', 
'Sample size', 'Standard deviation', 'Estimated error'). To 
our regret, it is not possible to extend the translation that 
we provided between mathematics, EcoSpold and CMLCA 
to the ISO/TS-14048-data documentation format, unless a 
precise definition of the parameters that are supposed to 
represent the distributions has been established. 

Interpretation of uncertainty information in data and results 
is an indispensable part of sound decision making and should 
be an integral part of the analysis itself. We hope that the 
present exposition stimulates and helps LCA-practitioners 
to apply uncertainty analyses in their practice of using LCA. 
Moreover, we hope that suppliers of LCA databases and 
software will take care to include uncertainty information 
and processing in their products. We invite these other sup¬ 
pliers to provide their parameter representations in a clear 
and unambiguous way, so that tables with translation for¬ 
mulae like the four above may be constructed. We also hope 
that a future revision of the ISO/TS 14048 document will 
put forward a standardized representation and terminology 
for statistical information. To what extent the representa¬ 
tions should be standardized is an open question. 
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